This storyboard presents the results of all the 12 case studies performed in the book, “Text Mining: An Uncharted Territory for Librarians”. The numbers 1A, 1B, 4B and so on, indicate the case study for that particular chapter and the letter represents the analysis from different tools, for instance, 1A shows the results from Chapter 1 for clustering using Orange tool and 1B shows the results from Chapter 1 for clustering using R. To know more about the case studies, and the methodology used to get the results, kindly read the book available on Springer.
The heatmap plot shows the distances between the documents.
The clustered heatmap plot shows another way to visualize the distances between the documents.
The dendogram presents the hierarichal clustering of documents using the ward method.
For clustering in R, elbow method was used to determine the number of clusters.
Euclidean distance method was used to determine the distance between the documents.
Hierarchical clustering with dendrograms is another way to visualise the distance between the documents.
Circular dendogram is yet another way to visualise the distance between the documents.
Phylogenic structure is another way of visualizing the same results with different perspective according to your research problem and dataset.
Timeline showing the core topics in DESIDOC Journal of Library and Information Technology from 1981 to 2018 (©2019 Springer Nature, all rights reserved – reprinted with permission from Springer Nature, published in Lamba and Madhusudhan (2019))
50 core topics were identified that fitted the corpus of 928 DJLIT research articles wherein only 29 topics were identified as unique.
Latent Dirichlet Allocation Topic and Word Result for PQDT Global ETDs during 2014-2018 (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020) )
The results shows the topics assigned to the corpus of ETDs.
The figure shows the results for 5 topics using Structural Topic Modeling (STM).
The figure shows second way of representing the results from Method 1.
The figure shows third way of representing the results from Method 1 and 2.
The figure shows fourth way of representing the results from Method 1, 2, and 3.
The Table presents the result for top five representative ETDs for the modeled topics and are ranked according to their probability.
The figure shows correlation between the topics using a network graph.
Word Co-Occurrence Network
The figure presents the word co-occurrence network for top 50 words that represent the literature indexed in Web of Science (WoS) database on malaria disease for year 2019.
Text Network
The figure represents the 22 clusters/communities of 238 words (nodes) which were determined from the network text analysis of the data.
Screenshot of evaluation result (©2020 Cadernos BAD, all rights reserved – reprinted under Creative Commons CC BY license, published in Lamba and Madhusudhan (2020))
The screenshot shows the evaluation results for library science ETDs in the PQDT Global database.